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The problem of minimizing coding or quantizing noise in a communica- 
tion system is posed in a general setting. It is shown that if the messages to 
be transmitted are sample sequences drawn from a discrete-time random 
process meeting a certain simply stated criterion of "randomness" and if 
there exists a quantized communication system which is optimal in that it 
introduces a minimum amount of coding noise, then this optimal system 
can be realized using a transmitter of special form. Specifically, the opti- 
mum transmitter is one which quantizes each message sample according to a 
scheme that depends only upon the quantized material already transmitted, 
rather than upon the (unquantized) material that has been previously offered 
for transmission. It follows that only digital storage is required at the 
transmitter or receiver. If the receiver is limited, a priori, to have only a given 
finite amount of storage, and if the system is optimum within this con- 
straint, the transmitter need have only the same amount of storage. 

I. introduction: the model 

Shannon's theory of communication, shows how to defeat noise intro- 
duced in a communication medium by restricting the repertoire of trans- 
mitted signals to a discrete set. 1 If the messages to be transmitted are 
not already in an appropriately discrete form, noise in the medium is 
then eliminated only at the expense of noise, here called coding noise, 
caused by the failure of the restricted family of available signals to 
represent faithfully the full family of possible messages. The amount of 
coding noise introduced is of course subject to control by design. 

This paper considers one aspect of the problem of minimizing coding 
noise. Noise in the medium is not considered. The paper limits attention 
to systems in which the random process representing the message is a 
discrete-time or sampled-data process. The sampling noise caused by 
creating such a process out of a continuous-time process is not considered. 
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The problem of selecting a coding scheme that maximizes the rate of 
communication over a noisy channel is not considered. Rather, the paper 
starts at the point that a coding scheme has been found, that is optimum 
according a fairly general criterion of fidelity. What is then shown is 
that the transmitter and receiver — encoder and decoder — of the system 
are of a special form. 

A Q-coded communication system is defined by a discrete set Q and by 
three jointly distributed random processes, {x„ , q n , y n \ n = 0, ±1, 
±2, • • • } . For purposes of this paper, the set Q will be either 

(i) the set {1,2, • • • , M] , where M is a given positive integer > 1, or 
(ii) the set {1, 2, 3, • • • } of all positive integers. 

The process {x n } represents periodic samples derived from the message 
offered for transmission, each x„ is a real random variable. {g n } represents 
the transmitted signals; for each n, q n is a random variable, taking values 
from the set Q and measurable on the sample space of {x n , x n -i , 
x n - 2 , • • •}• That is, for each n, the value of the integer variable q n 
depends only upon, and is determined (apart perhaps from events of 
probability zero) by the present and past of the message. {y r \ represents 
the version of the message reconstructed at the receiver; for each n, y n 
is a real random variable measurable on the sample space of \q n , q n -i , 
<7„_ 2 , • • • } . Therefore for each n, y n depends only upon, and is determined 
(apart perhaps from events of probability zero) by the present and 
past of the transmitted signal. 

The model at this point is very general. It provides that at each time, 
n a discrete valued random variable q„ be generated in some way out of 
the material {x n , x n - x , x n - 2 , ■ ■ • } then available from the message 
process, and that subsequently at the receiver a y n be generated out of the 
material {q n , q n -t , • • • } there currently available. If all three processes 
[%n , q n , y n ] are stationary we can call the system stationary. The ques- 
tion of stationarity does not enter in what follows. 

What remains to be specified in this model is that in some sense the 
process {y n } is to represent the process {x n \. At the start it appears 
natural to consider three cases; it develops that two are simply special 
cases of the third, one of them not interesting in the framework of this 
paper. 

We start with a given sequence {\f/ n | n = 0, ±1, ±2, • • • } of functions, 
in which each \b n is a real valued Borel measurable function \f/ n (x, y) of 
the real variables x, y. The use of a sequence { $„ } here is a largely deco- 
rative generality that costs nothing. The conventional case is that in 
which all yf/ n are the same function \f/. These functions define a fidelity 
criterion as follows: 
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Case (i), the delay-free case: 

Here we choose to regard y n as a replica of x n , and evaluate our 
communication system at each time n by the quantity 

E{f n (x n ,y n )}, (1) 

where E denotes expectation over the message ensemble. 

Case (it), the case of fixed delay: 

Here we are given a fixed integer d ^ and we choose to regard y n 
as a replica of .r„_ (/ , thus allowing q„ to take advantage not only of 
{x a - d , x n - d -i , ■ ■ ■ ) (the present and past of x n - d ) but also of \x n , x n - u • • • , 
x n - d+l ] (a limited span of the "future" of .T„_ d ) in representing x„- d . 
Here the criterion relative to .r„_ d is (by a convention we will use with 
respect to indices) 

E{Uxn- d ,y n )}. (2) 

If d = 0, this case reduces to case i. 

Case (Hi), block encoding with cycle time c: 

This is the situation that arises naturally in Shannon's theory. We 
are given a fixed integer c ^ 1, and the transmission process is repetitive 

with a, cycle of length c. By a choice of time origin, we can describe it as 

follows. Let Q t be a discrete set with Mi < <x> members. At time the 
transmitter examines [x ,X-i, • ■ •} and generates a Qj-discrete variable 
which we shall call q Q . At time c, the transmitter then examines [x e , 
x c -i , • • • } and produces q x ; the process repeats with period c. For trans- 
mission, the random variable q is encoded into the string {q c , q c -i , ■ • • , 
qi} of random variables each being Q-discrete, where M c ^ il/i . At 
time c, all of g 1 9-i i • • • are available at the receiver, being rep- 
resented by the sequence {q c , q c -i , q c - 2 , • • '} • From these, the sequence 
{.'/ 2c -i , Vae-a , •• • , Vc) is generated, representing a; , .t-i , • • ■ , z_ e + , , 
respectively. We think of these y's as being presented to the output 
of the receiver in the order of their indices, y e at time c, and so on. 

If one follows through the functional dependencies here, he sees that 
indeed the processes { .r n , q n , y n } are so related that each q„ depends at 
most upon {.r n , .r„_i , • • ■ }, and each y n at most upon \q n , g n _i , • • • }. 
Indeed, except at times which recur with period c, q n is not "up to 
date," depending in fact only on x's strictly prior to x n . Similarly, y n is 
only periodically up to date; at other times it depends only upon g's 
that are actually earlier than q n . 

In the situation as just described, the criterion of fidelity becomes 
E{ ^„(.T n - 2c -n , y n ) ) • Case Hi is then also a special case of case ii, in which 
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d = 2c — 1 ^ 1. What makes it special is that in case ii, q n and y„ 
are permitted to be up to date at each value of n, however in case Hi 
the block coding process restricts the currency of the data upon which 
most of the q's and y's depend. 

Actually, case Hi as just described will turn out not to be covered, 
in general, by the theorems to be proved. This happens because, as is 
later be stated more precisely, we are interested only in communication 
systems that minimize (2) for each n, in comparison with all possible 
competing systems. Clearly, to impose the restrictions immanent in 
case Hi upon one's reportoire of coding schemes limits the domain 
within which a minimum is to be sought. The system that brings 
about an absolute minimum is simply not, in general, to be found 
in this restricted domain. 

The previous observation is not to be entered as a criticism of Shan- 
non's theory. Typically, in a noisy medium, it is necessary to use a 
highly redundant encoding {q e , q e - x , • ■ • , <fr} to represent q , so that 
the inefficiencies (as measured by expression 2) that are imposed by 
the block-coding process are needed in order to ensure that the y n 
in (2) is an approximately error free replica of x n - d . We must remember 
that (2) measures the noise introduced by the coding process, not by 
the noisy medium. It is interesting to a designer only if the latter 
noise has been eliminated. The price of this elimination is that one 
may not be able to minimize (2) in competition with systems that 
are not restricted to be of block coding form. 

A true engineering solution to the problems reflected in the remarks 
immediately above would consider (2) in which the expectation is 
taken over the joint ensemble of message and noise. The solution 
should balance coding noise against channel noise at, say, a fixed delay, 
to minimize (2). This paper is very far from solving such a problem. 

It does not follow that the results of this paper are without interest 
in the search for coding schemes to eliminate noise. Given a Q-coded 
communication system which does minimize (2), the {q n } process is 
in digital form. This {q n } process can then be redundantly encoded 
according to Shannon's theory, and recovered with few errors (and 
typically much delay) at the receiver. The {y n \ process then results 
(perhaps delayed) and has few errors. Then (2) does measure the 
total amount of noise introduced in this operation. 

II. STATEMENT OF RESULTS 

Given the message process {x n }, the sequence {^ n }, and the delay 
d ^ 0, a Q-coded communication system [x n , q n , y n } will be called 
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\\p n , d\ -optimal if 

(i) For each n = 0, ±1, ±2, • • • 

E\\UXn-*,yn)\) < oo, (3) 

and 

(ii) For any other Q-coded communication system {x n , q' n , y' n ) , 

E{f n (x n - d , y n )} ^ E\+ n (x> n _ d , y' n )}, (4) 

for each n = 0, ±1, ±2, • • • . 

The simplest result of this paper is of such a form as to illustrate 
the nature of all of the results. We define a class K of functions \f/, 
and a class, here called CCD, of message processes [x n ], such that 
the following theorem is true. 

Theorem 1: Let {x n , q n , y n } be a given Q-coded communication system 
that is {\J/ a , 0} -optimal. If each \j/ D t K, n = 0, ±1, ±2, • • • , and if 
jx n } e CCD, then each q n is equal with ■probability one to a random variable 
measurable on the sample space of {x n , q n _i , q n _ 2 , • • • }. 

The force of this theorem is that it simplifies, in principle at least, 
the requirements for memory at the transmitter. Only the digital 
sequence \q n -\ > q n -2 > ■ ■ • } need be in storage at time n. The proof 
of the theorem will also develop a standard structure for the optimum 
transmitter difficult to summarize easily in a theorem. 

The definition of the class K is long and is deferred to Section III. 
Suffice it here to say that K is a large class that includes the conventional 

\f/ l (x, y) = | x - y |, f-(x, y) = (x - y) 2 , 

and any other continuous strictly increasing function of if/ 1 . 

We define CCD, and a related class CCDf, thus: 

CCD consists of those processes [x n ] such that: for each n = 0, 
±1, ±2, • • • , if z is a random variable measurable on the sample 
space of {:r n _i , .t„_2 , • • • ), then the probability that z = x n is zero: 

P\z = x n \ = 0. (5) 

CCDf consists of those processes {x n } such that: for each n = 0, 
±1, ±2, • • • , if A is a finite Borel field or the completion of a finite 
Borel field, and if z is a random variable measurable on the smallest 
Borel field containing A and the sample space of {.t„_! , x„- 2 , ••■}, 
then (5) holds. 
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Read CCD as "continuous conditional distribution." If {x n \ e CCD 
and if x n has a conditional distribution given {.r„_, , x n _ 2 , • ■ ■ }, that 
distribution must be continuous. 

We now define a more restricted class of Q-coded communication 
systems and a corresponding notion of optimality. 

Given an integer m ^ 0, a Q-coded communication system {x n , q n , y n \ 
will be said to have decoder memory span m if for each n = 0, ±1, 
±2, • • • y n is measurable on the sample space of {q n , q n -i , • • • , q„- m \- 

A Q-coded communication system {x n , q n , y n \ will be called 
[\f/„ , d, m] -optimal if it has decoder memory span m, if (3) holds for 
every n, and if (4) holds for every n and for every {x n , q' n , y' n ) which 
has decoder memory span m. 

In the case of {\f/ n , d, m) optimality, then, the competition is re- 
stricted to systems with decoder memory span m. We can put m = co 
to refer to the case of {$, , d) optimality defined earlier. 

Perhaps our most surprising result is that case ii of our model, 
which includes case * as a special case, is also included in case i. This 
is shown by Theorem 2. 

Theorem 2: Let {x n , q n , y n } he a given Q-coded communication system 
that is {^ n , dj -optimal. If each \f/ a t K, n = 0, ±1, ±2, • • ■ , if M, 
the number of elements of Q, is finite, and if {x n } e CCDf, then each q n is 
equal with probability one to a random variable measurable on the sample 
space of {x n _ d , q„_i , q n - 2 , ■ • • }■ Furthermore, the system {x B , q^ , y^}, 
where 

q.'. = q n+ a, n = , ±1,±2, ••• , (6) 

y« = y n+d , 

is a Q-coded communication system that is {^„ , 0} optimal, where 

ft«*. +d> n = 0, ±1, ±2, ••• . (7) 

Finally, we state a theorem that includes the two preceding ones. 

Theorem 3: Let {x D , q n , y n } be a given Q-coded communication system 
that is {if/ a , d, m}-opti?nal. If each \J/ a t K, n = 0, ±1, ±2, • • • , if 
M < oo, and if {x n } e CCDf, then each q n is equal with probability one 
to a random variable measurable on the sample space of {x n _ d , q n _! , 
• • • j q n - m } ({x D _ d } if m = 0). The system as defined by (6) is a Q-coded 
communication system with decoder memory span m that is {$£ , 0, m}- 
optimal, where jfcj is given by (7). //, in the initial hypotheses, d = 0, 
then it suffices that {x n } e CCD and the restriction M < « may be removed. 
If m < co, the hypothesis {x r } e CCDf may be replaced by: 
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For each n = 0, ±1, ±2, • • • , if z is a random variable that takes 
only finitely many values, then P{x n = z} = 0. 

Theorem 1 shows the basic facts about measurability in the present 
context. Theorem 2 adds the fact that delay d > gains no advantage 
(since the "future" of x n - d is not known at the receiver, even if it is 
at the transmitter). Finally, Theorem 3 includes these facts and shows 
that a limitation on the memory span of the receiver allows a cor- 
responding simplification of the transmitter. 

In the proofs of these theorems it is seen that they are true for classes 
of process slightly larger than CCD or CCDf. In particular, the final 
conclusion of Theorem 3 opens the case of finite memory span to any 
process {x n } that has a little additive nonsingular Gaussian noise in 
each sample. 

III. THE CLASS K 

The class K of cost functions allowed by these theorems can be 
very general. The definition below seems more inclusive than is called 
for by the applications I can think of; at the cost of elaboration, it 
can be enlarged further. 

We let K be the class of all functions \f/(x, y) of two real variables 
x, y with the following properties. 

(i) \f/(x, y) is continuous; 
(ii) for all x, y, 4>(x, y) ^ 0; 
(Hi) for all x, \f/(x, x) = 0; 

(iv) for each y, there are at most countably many solutions x to the 
equation 

*{x, y) = 0, (8) 

in the sense that: there exist Borel measurable functions g k (y), k = 
1, 2, 3, • • ■ , such that if (8) holds, then for some k, x = g k (y). 

v) If y x 9& y 9 , there are at most countably many solutions to the 
equation 

t(x, Vi) = f(P, y 2 ), (9) 

in the sense that: there exist Borel measurable functions f k (y, z), k = 
1, 2, 3, • • • such that if (9) holds and if y, 7* y 2 , then for some k, x = 

fk(Vi , 2/2). 

It follows from this definition that ^' e K, where ^\x, y) = \ x — y |. 
Then also \{/ 2 e K, where \l/ 2 (x, y) = (x — y) 2 . Similarly any other con- 
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tinuous strictly monotone function of r}/ 1 is also in K. In all of these 
instances, (8) has the unique solution y = x, and (9) has a unique 
solution given by 2x = y x + y 2 - 

TV. PROOFS 

Let {J2, B, P} be a probability space: A set U of points a>, a Borel 
field B of subsets of 12, and a probability measure P on B with respect 
to which B is complete. This probability space is assumed given and 
fixed. 

A random variable x is a real-valued function x(u>) defined on [2 
and measurable B. 

If F C B is a Borel field, a random variable x is said to be essentially 
measurable F if x is equal with probability one to a random variable x' 
which is measurable F. If F is complete, such an z is then itself meas- 
urable F. 

If F Q B is a Borel field and x a random variable, {x} V F denotes 
the smallest Borel field such that: x is measurable {x\ V F and 
F C {x} V F. 

A random variable taking its values in the set Q will be called Q- 
discrete. 

Denote by [x \ q, F | y, G] a mathematical object of the following 
kind: 

a; is a random variable, 

q is a Q-discrete random variable, 

F is a Borel field, FCB, and q is essentially measurable on the field 
determined by F and the sample space of x, 

y is a random variable, 

G is a Borel field, G CI {x} V F, and y is essentially measurable on 
the field determined by G and the sample space of q. 

For convenience let CQAx ("conditionally quantized approximation 
to x") denote the class of all objects of the kind described, based on 
the given probability space (£2, B, P}, the given x, and the given set Q. 

Given a Q-coded communication system { x n , q n , y n } , given a delay d 
and a memory span m, let X„, d be the sample space of the selection 
{x n , x n -i , • • • } of random variables from which the specific variable 
x n - d has been deleted. Let Q„, m be the sample space of the random 
variables {q n -! , q n - 2 , • • • , q n - m \ ■ Then it is easy to see that {x„ , q n , y„} 
is a Q-coded communication system with decoder memory span m 
if and only if for each n = 0, ±1, ±2, • • • 

[x n -d I 3» , X„ id I y n , Q n , m ] e CQAx n . d . 
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Given ^, a [x \ q, F | y, G] e CQAz will be called weakly i/'-optimal if: 

(i) E{\yf,(x,y) \) < cc, 

(«") If random variables g' and ?/' are such that [a; | g', F | y', G] e CQA.T, 
thenWCc, y)\ fg E{+(x, y')\. 

The qualifier "weakly" in this definition signals the fact that the 
fields F and G are not allowed to vary in the competition for optimality. 

Lemma 1: If {x n , q n , y n } is a Q-coded communication system with 
decoder memory span m, and if [x n , q n , y n ) is {^ n , d, m} -optimal, 
then for each n [x n _ d | q n , X n , d | y n , Q n , m ] is weakly ^-optimal. 

Proof: Fix an n; for convenience identify it as n = 0. Suppose that 
we are given random variables q' and y', which we shall here call q' 
and y' , such that 

[x. d | q' , X . d | y' , Qo, m ] £ CQAx d . 

Define a new Q-coded communication system {x n , q' n , y' n \ thus: 

For n < 0, q' n = q n , y' n = y n ; 

For n = 0, q' Q and y' Q are those above; 

For n > 0, q' n = 1 and y' n = 0. 

That this is a Q-coded communication system with decoder memory 
span m follows at once from the definitions. Furthermore, the sample 
space of {qL t ,q'- 2 , ■ ■ ■ q„} is Q , m • Because \x n , q n , y„} is {\f/ n , d, m}- 
optimal, we conclude that E{ \ ^ (x- d ,y ) \ } < <» andthat^{i/'o(a;- ( , ,y )} 

^E{Ux- d ,y'o)}. 

These, however, prove that [z_ d | q , X 0id | y , Q ,m] is weakly if/ - 

optimal. Clearly this proof can be repeated for any other value of n. 

The proof of this lemma indicates, deliberately, the force of the 
notion of {yp n , d, m\ -optimality for {x n , q n , y n \. The competing com- 
munication system {x n , q' n , y' n \ used in the proof sacrificed all reason- 
able behavior for n > 0, yet was still allowed to compete at n = 0. 
In particular, notice that even if {x n , q n , y n } is stationary, it must 
compete with nonstationary systems designed to excel at only one 
value of n. The theorems of Section II are not proved for stationary 
systems which are known only to minimize each E{\f/ n (x n - d , y n )) 
against competing systems drawn from the class of stationary systems. 
Given a Borel field G C B, we define CCD(G) analogously to CCD: 
CCD(G) is the class of all random variables x such that: 

If z is a random variable measurable G, then P{x = z\ =0. 
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The results of this paper all derive from Theorem 4. 

Theorem 4: Let [x | q, F | y, G] e CQAz and suppose that it is iveakly 
yp-optimal. If Q is a finite set, or if \f/ is Borel measurable and for each 
x is bounded from below, then there exists a Q-discrete random variable 
q' and a random variable y' such that 

(i) [x | q', G | y', G] t CQAx, 

(ii) \f/(x, y') = $(x, y) with probability one. In particular, also, the 
object i is weakly ^-optimal. 

IfxfstK and x e CCD(G) then also 
(Hi) q' = q with probability one, and 
(iv) y' = y with probability one. 

It then follows that the given q is essentially measurable on the Borel field 
{x} vG, determined by G and the sample space of x. 
We wish to use the given [x \ q, F | y, G] as a model for some 

[x„- d | q n , X„. d | y n , Q n , m ] 

in a Q-coded communication system. Conclusions i and ii show that 
for any given n we can find a q' n essentially measurable \x n - d ] VQ„, m 
and a y' n such that, according to the criterion defined by $, y' n represents 
x n - d as well as y n did. Without conclusion Hi, however, the substitution of 
q'„ for q n can alter the subsequent Borel fields QUk.m , k ^ 0, to the point 
that we are no longer sure that [x n+k - d | q' n+k , X n+k , d | y„+k > Qn+*.mL ^ > 
is weakly ^ n+fc -optimal. Without Hi, therefore, one cannot apply The- 
orem 4 to prove the other theorems. 

It is convenient now to invoke a lemma which is a simple theorem 
from measure theory. The lemma provides a standard form for the 
variables q and y of an object [x \ q, F | y, G] t CQAx. 

Theorem 2: Given a Q-discrete random variable q and a Borel field G, 
if y is a random variable measurable on the Borel field determined by G 
and the sample space of q, then there exist random variables {z p , p e Q} 
such that 

(i) each z p is measurable G and 

(ii) for each o> c fi, if q(w) = p then y(u) = z p (to). 

Conversely, of course, given { Z p , p e Q } , each measurable G, any y defined 
by ii is measurable on the field determined by G and the sample space of q. 

The proof of this lemma consists in showing that the class of random 
variables of the type of y above, as the {z p , p t Q] are selected arbi- 
trarily from the class of variables measurable G, exhausts the class 
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of all random variables measurable on the Borel field determined by G 
and the sample space of q. The proof is a straightforward exercise in 
measure theory and is omitted. 

To begin the main argument, given [x | q, F | y, G] c CDAx and 
a Borel measurable function \f/(x, y), if for each x ^{x, y) is bounded 
from below, or if Q is a finite set, we can define the random variable 

£(«) = inf iKx(«),z r («)). 

TtQ 

Then £ is measurable {x) VG. 

Given p tQ and r c Q, we define sets T* , T pr , T„ by 

r* = W | iKsC«). *■(«)) = $(«)}, 

r*p 

rtU 

Clearly each of these sets is measurable {x\ V G. T* is the set where 
the index p minimizes \j/{x, z p ), and T p is that subset of T* where this 
minimizing index is unique. It follows that if r ^ p then 

T p A T* = 0, (10) 

and as a consequence, T p A T r = 0, r ^ p. 
Clearly 

Also 

r* A T pr = r* a T pr , (ii) 

since either side is the set where an index minimizing \p(x, z,) can be 
equal either to p or to r. 

In terms of these sets, the argument to be used can be outlined 
briefly. First, one shows that the T% essentially cover ft, in the sense 
that there is a null set N such that 

n - N = U n ■ (12) 

pcO 

This follows without argument, and with N = 0, if Q is finite; it results 
from t/'-optimality in general. 
Second, by definition 

T* -T p ^ \jT vr . (13) 

reO 
Tf'v 



3102 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 19159 

Third, one observes that for p, r t Q and p ^ r, T pr consists of the 

set S vr 

S PT = {co | 2 P («) = Z r (a>)} 

plus a disjoint remainder T PT — S pr . The hypothesis x z CCD(G) allows 
one to show that this remainder is a null set. Over the set S pr , on the 
other hand, the information about x conveyed by the family \z p , pzQ) 
is redundant. The hypothesis of ^-optimality can then be violated, 
unless S PT is also a null set. It follows then that each T Pr is a null set, 
and from (12) and (13) then that the T p partition fi apart from a null set. 
From this the full theorem follows quickly. 
To proceed with (12), given p z Q, let N p be the set 

N P = {co|g(co) =p) A {0- \JT*}. 

TtQ 

FixancoeJVp ;theny(co) = z p (co) but a i T* , sothat£(cu) < x//(x(co), z p (lo)). 
It follows that there is some r z Q, r t± p, such that 

*(*(«), zM) < f(x(a), zM), (14) 

and indeed, since Q is bounded from below, that there is a least such r, 
call it r*(co). Notice that N P is measurable on the Borel field determined 
by the sample space of {x } , by F, and by G. Since GC [x } V F, it follows 
that N p is measurable {x} V F. That subset R pk of N p where r*(co) = k 
is empty if k = p; otherwise 
R pk = N p A (w | *(«(«), *i(«)) < *(«(«),*»(«))} if k = 1 * p, 

fl„* = iV p A {co | *(a;(<w), «*(«)) < *(a:(w) ,«,(«))} A 

•H (co | *(a;(«), *,(»)) ^ *(*(«), 2p (co))} if fc > 1, k*p. 

It follows from these equalities that R pk and r* are measurable {a;} V F. 
We now define the Q-discrete random variable q' by 

If p z Q and cot N p , q'(co) = r*(to); 

If co e £2 — \J pcQ N v , then q'(u>) is the least value of r z Q such that 
co e T* . 

Since the 2V P cover the complement of \J r T* , and since Q is bounded 
from below, this defines g'(co) for each to z SI; clearly q' is Q-discrete. 
Given kz Q, the set where q' ^ k consists of the union of 

\JR V , 
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with the set V k , where 

7, = Tt 

V k - (0 - H) A • • • A (12 - rtO AT?, fc > 1. 

Since each V k is measurable {x} V G C {x} V F, it follows that g' is 
measurable {a:} V F. Furthermore, over 12 - \J piQ N v , q' is equal to a 
random variable that is measurable {x} V G, since each V k is measurable 
on this latter field. 

We now define the random variable y' by 

2/'(«) = Vo(«)i ut n - 

Then ?/' is measurable on G and the sample space of q'. It follows that 
[a; | q', F | ?/, G] e CQAx, and from the hypothesis of weak ^-optimality 
then that 

E[f(x,V)) ^ E[+{x,y')}. (15) 

But now we claim that for all oj e 12 

*(*(«), </») ^ *(*(<>)> »(«))■ < 16) 

First, it wz N v , we have 

*(»(«),&'(«)) = *(*(«)* *r,«C>(»)) < *(*(«) I **(«)) 

= ^),y(«)), (17) 

the inequality being by definition of r* . Therefore strict inequality 
prevails in (16) for co e O p[Q N p . Consider now an w e (12 — l^/ rt0 iV r ) A 
{ w ' | q '( u ') = p}. For this w we have « e T* and ^(x(co), y'(co)) = 
lKz(«), z,(«)) ^ iKx(u), 2,(w)) for any r e Q, by definition of T* . But 
then (16) follows for this a because y(«) = z r (u) for some r e Q. 

Now from (16), by taking expectations, we conclude the inequality 
opposite in sense to (15), hence (15) is an equality, and (16) is then 
an equality with probability one. Therefore ii of Theorem 4 is proved. 
Now by (17), (16) is a strict inequality over N = \J PtQ N v . Hence 
this latter set is a null set. Therefore i of Theorem 4 is proved, since 
q' is equal, over the complement of N, to a variable that is measurable 
[x\ VG, as we noted earlier. Finally, since 

12 - U T* = U Nr = N 

the T* essentially cover 12. This is (12), as was to be proved. 
It would be possible at this point to invoke the hypotheses ^ e K 
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and x t CCD(G) to conclude iv of the Theorem. It will be more effi- 
cient to prove Hi and iv together. To do so requires, as our earlier 
outline suggests, that we examine the sets T* p A T VT over which re- 
dundancy prevails (because on T* A T„ either of z p or z T , where 
r 9* p, could be used to define the same value of y minimizing \p(x, y). 

We have concluded (12), that except for oj e N, a null set, for each 
oi there is at least one p t Q such that £(a>) = yf/(x(o)), z p {u>)), that is, 
the minimizing index is uniquely p for w z T p — N. 

Now define, as earlier, for r j£ p, 

Spr = [<a I zM = «,(«)}. 

Then if o> t T PT — S vr , we have 

tff(x(a), z p (a)) = f(x(<a), z,(«)), z„(w) ^ z r (a). 

Since ^ e K, it follows that for some k = 1, 2, ■ ■ • we have 

*(«) = /*(*», *(«))■ (18) 

Now let A kPT be the set of all o> such that (18) holds. We have just 
showed that 

T vr - S pr C ^ • (19) 

But now, since / fc is Borel measurable and each z p is measureable G, 
(18) constrains x on A kpr to be equal to a random variable measurable G. 
Since x e CCD(G), then .A tI)r is a subset of some null set, 

P[A kpr ) = 0, k = 1, 2, •■• , 

and 

E^l-V) =o. 

This last with (19) makes P{T pr - S pr ] = 0. Indeed, finally, since Q 
is countable, 

P\KJU(T P r- S pr )} = 0. 
PtO rcQ 
rt*r 

It is important later that by definition, S pr is measurable G and 
therefore that, by (19), T PT is essentially measurable G. 

We now define a new Q-discrete random variable q" and a corre- 
sponding y". The construction depends upon an arbitrarily chosen 
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/> e Q and an arbitrarily chosen real number a, although the notation 
will not emphasize this dependence. Later it will be shown that q" = q' 
and y" = y' each with probability one, so that the dependence upon 
p and a is not essential. 

Fix & p t Q and select a real number a. Define the random variable 

K'M b y : 

if toe \jT Por , z' p 'M = a, 

otherwise, Zp'(co) = z Po (co). 

Then «£'(«) is measurable G. Define 

z' p ' = z„ , p c Q, p ^ p . 

Then certainly each z' p ', p c Q, is measurable G. Define the Q-discrete 
random variable g"(co) by 

If co t T Vo V [(T* - TJ A («' | *(s(a>'), a) < *(*(«'), *,>'))}] 
then q"(co) = p ; 

if ui (T* - TJ A W | iKoj(w'), a) Z iKsGA s,.W)\ 
then q"(oj) is the least value of p e Q such that p ?* p and to e T*; 

if co e 12 - T„* , then q"(o>) = g'(to). 

It is easily seen that this defines q" for all co e fi. 

We now define the random variable y" by y"(o>) = 2^. (u) (co). Then 
y" is measurable on G and the sample space of q", so that by con- 
struction [x | q", F | y", G] t CQAx. Applying the hypothesis of weak 
i/'-optimality, we conclude that 

f [*(*, y") - +(x> V)] dP - *{*<?. 2/")) - Wfo »)l ^ o- (20) 

•'n 
We now partition the domain fi of integration into the four sets 

-4, = T v . 

A, = (T* - r p .) A (« I *(*(«)> o) < iK*(«), «p»)}, 

^3 = (T* - rj A (co | l(z(«), a) ^ *(x(co), *,>))}, 

That this is a partition follows from the definition and the fact, already 
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proved, that T Vo C T* . We consider the four resulting integrals 
individually, in the order of the listing. 

If w e T V0 then either w e T v , A N, or u e jT p , - N. We may ignore 
the first case. For the second, by definition of T Po , if r 5^ p 

*(x(«), «,.(«)) < *(x(»)i *(«)). (21) 

Also, by definition 

rcO 

and therefore by definition ag£(u) = «».(") j a "d ?"(<*>) = p . Then 

*(&(«), y"(»)) = *(*(«), #(»)) - *(*(*>), 2 P0 ( W )) 

and from the inequality (21) we conclude that the integrand 

*(*(«), y"(«)) - ^(*(«), y(«)) < 0, 

since y(w) is equal to some «,.(«)> r e Q. Hence the integral over A r is 
not positive. 

If co e A 2 , then by definition g"(co) = p and 

lf"(«) = KM- 
Again, we ignore the contribution of A 2 A N. If to t A 2 — N then by 
(13), 

Then by definition z£ o '(w) = a. Hence, the integrand 

$(x(u), y"(«)) - *(*(«),?(»)) 

= [^(aj(«), a) - *(*(«), z„»)] + [*(&(«), z„.(«)) - *(*(«)> y(«))]. 

The first bracket on the right is <0 by definition of ^4 2 , and the second 
is ^0 because co t T* Q and by definition of T* we have iKx(w), z Po (co)) ^ 
^(a?(co), z,(w)) for all r c Q; among the latter is 4f(x(ia), y(u)). Hence 
the second integral is not positive, and its integrand is strictly negative. 
Now consider uei 3 . We ignore the integral over A 3 A N l . If 
(azA 3 -Ni, then c7"(co) = p ?± p and co e T* p for some p c Q. For this 
co we have 

Me(«), !/"(«)) = *(*(«)» 2 i'(")) = *(*(»)• 2 ») = *(*(«)i 2 '(")) 
for all rtQ; here the first equality is by definition of y", the second 
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by definition of z' p ' since p 9^ p , and the inequality is by definition 
of T% . But the inequality makes the integrand in (20) ^ 0, since 
yfa) = z r (u>) for some r t Q. Therefore, the integral over A 3 is not 
positive. 

Over A 4 , the integrand of (20) is 

W(x, y") - 4>fa y')} + [*(*, v') - K** y)l 

The second bracket vanishes with probability one by ii of Theorem 4, 
already proved. The first bracket is 

and this vanishes for all w e A 4 by the definitions because over A 4 , 
£(o>) < \p(x(co), z Po (co)) so that g'(w) 5^ p ; therefore by definition 

We conclude from these calculations that the integral (20) cannot 
be positive. By (20), therefore, the integral vanishes. But the argument 
showed that the integrand was ^0 with probability one, hence indeed, 
the integrand vanishes with probability one: 

^(x> y") — ^(Zj y) wr th probability one. 

In particular, over A 2 , the integrand was strictly < 0. Therefore 
A 2 has probability zero. We shall now exploit this fact. 

In the argument above, a was any real number. Let { a n ) be a countable 
dense set of real numbers and let 

W n = {co I *(»(«), a n ) < $(x(u), z„ (»))}. 

We have just proved that P{A 2 \ = 0, which is to say that we could 
have proved, for each n, that 

P{(T* - T V0 ) A W n ) = 0. 
Then also 

N 2 = U (T*. - TJ A W n 

is a null set. Now if u t N 2 , then u t T* - T Po and also there is some 
number a n such that 

*(*(«), On) < *(*(«), «,.(«))■ (22) 

Conversely, if u e T * — r po and there is a number a n such that (22) 
is true, then co t N 2 . Therefore if a t (T* - T Vo ) - N 2 , then for every 
number a n we have 

*(*(»), «0 ^ *(*(•),*.(•)). (23) 
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Given an w e (T p * — T Po ) — N 2 , choose a sequence a„ — * z(co). Assume 
that \J/ e K. Then \J/ is continuous and from (23) we have 

- iKz(co), x(u)) = lim +(x(a), a n ) £ ^(x(ta), z Po (co)) ^ 0. 

Notice, incidentally, that it suffices here for each x that iff(x, y) be 
continuous for y in some neighborhood of x. This is an example of one 
way in which K can be enlarged. 

From this and item iv in the definition of K, there is some integer K 
such that 

*(•) = 0*(*p»). (24) 

Let C k be the set of all co such that (24) holds. Since g k is Borel meas- 
urable, over C k , (24) constrains x to be equal to a function measurable 
G. If x e CCD(G), then C fc is a null set. But we have just showed above 
that 

(T*. - TJ - N 2 Q\JC k . 

Therefore 

P{T* - T Po ] = 0. 

Since p was arbitrary, this can be proved for each p c Q; therefore 
from (12) the T p , p e Q essentially cover Q. We proved along with 
definitions that the T p are pairwise disjoint, hence they partition 
£2 — N 3 , where N 3 is some null set. 

We continue the argument using the selected p . For co e £2 — N 3 , 
either co e T Po or a t T T where rtQ but r ^ p . In this latter case, however, 
as we proved with the definitions, co c Q — T* ; then by definition 
g"(co) = g'(co). If co e T Po , by the definitions g"(u) = t/(co) - p . There- 
fore 

q" = q' with probability one. (25) 

Furthermore we know that if co e T p , then g'(co) = p. From (25) 

y"(«) = ^ (u) (co). (26) 

If co e fi — T Po , except at most on a null set we have g"(«) ?* p and 
from (26) and the definition of z' v ' 

y"(«) = #<„» = ««•<.»(«) = •(«)! COE (fi - T P0 ) A JV 5 (27) 

where N 5 is a null set. Now if co e T Pe — N, we showed earlier that 
K'M = Z *M- Hence the equahties in (27) hold for co e T v , — N as 
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well, so that 

y" = y' with probability one. (28) 

Equalities (25) and (28) free the constructions from any dependence, 
except on a null set, upon the initially selected p Q and a. We need 
the Theorem to make identification with q and y. 

Let S p be that subset of T p where g(w) ^ p. Then if w e S p , by de- 
finition of T v , 

i>(x(fa), y'(u)) = f(x(w), Z 9 (ta)) < f(x(a), *,<«)(«)) - ^(x(fi>), ?/(w)). 
From w of Theorem 4, then, P{S P \ = 0, and P{\J t ,q S r ] = 0. Since 
the T p , p t Q, essentially partition ft, it follows that q' = q with prob- 
ability one, and at once that ?/(u>) = £«,(„) (w) = z„'< u >(<o) = y'{u>) with 
probability one. These conclusions are Hi and iv of the Theorem, the 
proof of which is now complete. 

To prove Theorem 1, let [x n , q n ,y n ) be a given Q-coded communica- 
tion system that is {$„ , 0} optimal. Given n, by Lemma 1, 

[Xn | q n , X n-0 | y n , Q..J e CQAx n 

and is weakly ^-optimal. If \f/ n e Zi" and x n z CCD(Q n , m ), Theorem 4 
proves that q n is measurable on {x n } V Q niCO ■ But Q niCD is the sample 
space of {q n - x , q„- 2 , • • • }, and is therefore contained in the sample 
space of {x„-i , .r„_ 2 , • • • } , since by hypothesis \x n , q n , y n ] is a Q-coded 
commimication system. The hypothesis {x n } t CCD of Theorem 1 then 
implies that for the given n, x n t CCD(Q n _ a ), and Theorem 4 establishes 
Theorem 1. 

Turning to Theorem 3, let [x n , q n , y n } be a, given Q-coded com- 
munication system with decoder memory span m, and suppose that 
it is [yp n , d, in) -optimal. By Lemma 1, then, given n, [x„- d \ q n , X„ , d \y n , 
Qn.m] £ CQAx n - d and is weakly ^-optimal. By the hypotheses of The- 
orem 3, \p n e K, and [x n ] t CCDf. Consider Q„, m , the sample space 
of {q n -i , q n -2 , • • • , q n -m\- Suppose first that m > d; then this sample 
space is the smallest Borel field which contains both the sample space 
of {g n _! , • • • , q n - d ] and that of [q n -d-i , • • • , q n - m )- Since M < oo, 
the first of these is a finite field, and the second is a subfield of {£ n _ d _i , 
x n - d -2 , • • •) (since {x n , q n , y n ) is indeed a Q-coded communication 
system). The hypothesis {x n \ e CCD} then implies that x n e CCD(Q„, m ). 
If m ^ d, the subfield of {x n _ (i _i , • • •} is empty, but the reasoning 
and conclusion are still valid. Then Theorem 4 applies and we conclude 
that q„ is measurable on the sample space of {x n . d , q„. x , • • • , q n - m }. 
This is the first conclusion of Theorem 3. We note now that a weaker 
hypothesis than {x n \ e CCDf could suffice here. Indeed, if in < oo , 
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it is sufficient that: if A is a finite field then x n e CCD(A). This is the 
final conclusion of Theorem 3. 

Given that q n is essentially measurable on {x n - d , g„_i , • • • , q n - m \> 
for each n, we conclude by induction that q n is essentially measurable 
{x n . d , x„- d _, , q n -2 , ••• , ft.-— ill • • • and finally then that q n is es- 
sentially measurable {x n - d , x n _ d _i , • ■ • }. Define 

q'n = q n+ d , 

y' n = y n+ d , n = 0, ±1, ••• . 

Then it is a simple translation of notation to verify that {x n , q' n , y' n ) 
is a Q-coded communication system with decoder memory span m 
that is {yp' n , 0, m) -optimal, where f£ = \p n+d , n = 0, ±1, • • • . This 
is the second conclusion of Theorem 3. 

Finally, if d = 0, then "{x n } z CCDf" may be replaced by: u {x n \ 
e CCD." Then M is unrestricted, since no "future" is involved that 
must be restricted to a finite field. This completes the proof. 

Theorem 2 is a limiting case of Theorem 3, proved by putting m = «> 
everywhere in the proof of Theorem 3. 

V. A COROLLARY 

It is a consequence of Lemma 2 and of the proof of Theorem 4 that, 
given co, in a set of probability one, g(co) is that unique value of p which 
minimizes $(x{u), z v {<*)). (This was remarked in connection with 
equation 25.) Applying this to the situation of Theorem 1, one sees 
that the transmitter of a delay-free Q-coded communication system 
[x n , q n , Vn] satisfying Theorem 1 has the block diagram form shown 
in Fig. 1. (If d > 0, one simply puts an analog delay line in the input 
lead, ahead of the rest of the system.) 

This block diagram can be described thus: at time subsequent to 
t = n — \ and prior to t = n, the transmitter has in its digital store 
the values g„_i , g„_ 2 , ■ ■ • of the previously transmitted signals. From 
these, quantities g,,„ , z 2 , n , z 3 . n , • • • are constructed. These are the 
z p of Lemma 2, for the particular random variable y a . When x n becomes 
available, quantities yp n (x n , z,. n ), \^ n (x n , z 2 , n ), • • • are constructed and 
the comparator identifies the least of these (unique with probability 
one). The transmitted q n is that value of the index which identifies 
the least \f/ n (x„ , z„, n ). This index is transmitted to the receiver as q n 
and is also stored in the transmitter's memory for the next cycle. 
The receiver can be realized using a portion of the transmitter, as 
suggested in Fig. 2. Each function generator in these diagrams can 
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Fig. 1 — Generalized form of optimum transmitter. 

of course be nonstationary. Connections to a master "clock" are not 
shown. 

VI. REMARKS ON K AND CCD 

One might ask to what degree are the central hypotheses of Theorem 4 
necessary to the conclusions. The theorem itself provides a partial 
answer: conclusions i and it do not use x e CCD(G) at all, and use 
only a measurability and a boundedness property of \p. The critical 
conclusions are the uniqueness conclusions Hi and iv. Clearly, something 
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Fig. 2 — Form of receiver. 
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is required of \p(x, y) that makes it, in some sense, smaller when y = x 
than elsewhere, and not too indifferent to the value of y when y j* x, 
if uniqueness is to be expected from the hypothesis of i/'-optimality. 
As we have already noted, the hypothesis $ t K is fairly weak in this 
regard, and could, in the presence of CCD, be made weaker at the 
expense of further elaboration of the proof. 

The interesting hypothesis is x t CCD(G). This implies that if x has 
a conditional probability distribution relative to the field G, then that 
distribution is continuous. It is easy to see that the ^--optimum quantiz- 
ing of a random variable x need not be unique if the distribution of x 
is not continuous, even when one uses \J/(x, y) = (x — y) 2 . Since y 
in Theorem 2 ^-optimally quantizes x for each event measurable on 
the conditioning field G, something like x t CCD(G) is necessary if 
conclusion iv is to follow. Thus we conclude a loose kind of necessity 
for this hypothesis. 

We notice finally that Hi and iv were proved by confining the re- 
dundancy among the \z v , p c Q) to a null set. In the application of 
this idea to the situation of Theorem 1, it seems likely that redundancy 
in the {z pn , p t Q} for some fixed n might indeed be exploited to improve 
some 

E{+ n+k (x n+k ,y n+k )\, k > 0, (29) 

by selection, among the minimizing z pn to which E{yp n {x n , y n )\ is in- 
different, one which actually contributes information about x n + k and 
therefore allows a reduction in (29). I have no example to show this 
phenomenon, so its existence remains a conjecture. We have proved, 
of course, that its possible existence is ruled out by x e CCD(G). 
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